Finding Frequent Subsequences in a Set of Texts
نویسندگان
چکیده
Given a set of strings, the Common Subsequence Automaton accepts all common subsequences of these strings. Such an automaton can be deduced from other automata like the Directed Acyclic Subsequence Graph or the Subsequence Automaton. In this paper, we introduce some new issues in text algorithm on the basis of Common Subsequences related problems. Firstly, we make an overview of different existing automata, focusing on their similarities and differences. Secondly, we present a new automaton, the Constrained Subsequence Automaton, which extends the Common Subsequence Automaton, by adding an integer q denoted quorum.
منابع مشابه
Finding Exact and Solo LTR-Retrotransposons in Biological Sequences Using SVM
Finding repetitive subsequences in genome is a challengeable problem in bioinformatics research area. A lot of approaches have been proposed to solve the problem, which could be divided to library base and de novo methods. The library base methods use predetermined repetitive genome’s subsequences, where library-less methods attempt to discover repetitive subsequences by analytical approach...
متن کاملIncremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams
In recent years, new applications emerged that produce data streams, such as stock data and sensor networks. Therefore, finding frequent subsequences, or clusters of subsequences, in data streams is an essential task in data mining. Data streams are continuous in nature, unbounded in size and have a high arrival rate. Due to these characteristics, traditional clustering algorithms fail to effec...
متن کاملMining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM
Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...
متن کاملEfficiently Mining Closed Subsequences with Gap Constraints
Mining frequent subsequence patterns from sequence databases is a typical data mining problem and various efficient sequential pattern mining algorithms have been proposed. In many problem domains (e.g, biology), the frequent subsequences confined by the predefined gap requirements are more meaningful than the general sequential patterns. In this paper we re-examine the closed sequential patter...
متن کاملThe Probabilistic Algorithm for Mining Frequent Sequences
The subject of the paper is to analyze the problem of the frequency of the subsequences in large volume sequences (texts, databases, etc.). A new algorithm ProMFS for mining frequent sequences is proposed. It is based on the estimated probabilistic-statistical characteristics of the appearance of elements of the sequence and their order. The algorithm builds a new much shorter sequence and make...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008